Skip to content

feat: surface gate detail in the workflow run/resume --json payload#2965

Open
doquanghuy wants to merge 9 commits into
github:mainfrom
doquanghuy:feat/2964-gate-outcome-json
Open

feat: surface gate detail in the workflow run/resume --json payload#2965
doquanghuy wants to merge 9 commits into
github:mainfrom
doquanghuy:feat/2964-gate-outcome-json

Conversation

@doquanghuy

@doquanghuy doquanghuy commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Description

Reference implementation for #2964 — for discussion, direction welcome.

When a run pauses at a gate, the --json outcome now carries a gate block (step_id / message / options / choice) so orchestrators can detect "human review needed" and present the options without parsing the human-facing stream. Two small pieces:

  1. The engine records each step's type in the run state's step results (one added line in step_data — previously the type was not recoverable from state).
  2. _workflow_run_payload adds the gate block via a _gate_outcome helper when the run's current step is a gate. choice populates when the outcome ends at the gate with a decision recorded (e.g. an interactive rejection with on_reject: abort → a failed payload carrying "choice": "reject"; an on_reject: retry pause likewise). A mid-flow approval proceeds past the gate, so the block clears — by design. Non-gate runs and runs that end elsewhere are unchanged — no gate key, payload byte-identical to today.

The issue lists alternatives (a generic paused_step block; a dedicated status value) — happy to rework toward either.

Testing

  • Ran existing tests with uv sync && uv run pytest — full suite 3727 passed
  • Two new CLI-level tests (TestWorkflowRunGateOutcomeJson): a gate pause carries the exact block (CliRunner stdin is non-TTY, so the gate pauses); a completed run has no gate key — the gate-pause test is red against current main, green with the change (verified both directions)
  • uvx ruff check src/ — clean
  • Tested locally with uv run specify --help
  • Tested with a sample project (covered by the CLI-level tests, which drive a real gate workflow through workflow run --json)

AI Disclosure

  • I did not use AI assistance for this contribution
  • I did use AI assistance (describe below)

Code, tests, and this description were authored with AI assistance (Claude); verified by running the repo's test suite and ruff locally in both red and green directions.

@doquanghuy doquanghuy requested a review from mnriem as a code owner June 12, 2026 17:37
@doquanghuy

Copy link
Copy Markdown
Contributor Author

@mnriem when you have a moment, would appreciate your thoughts on the direction here — the issue lists the alternatives considered, and I'm happy to rework toward whichever shape fits Spec Kit best.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the workflow CLI’s --json run/resume outcome payload to include structured details when the run is paused at a gate step, enabling external orchestrators to detect “human review needed” without parsing stdout.

Changes:

  • Record each executed step’s type into persisted step_results so step types are recoverable from run state.
  • Add an optional gate block to the workflow run --json / workflow resume --json payload when the current step is a gate.
  • Add CLI-level tests covering a non-interactive gate pause (includes gate block) and a non-gate completed run (no gate key).
Show a summary per file
File Description
tests/test_workflows.py Adds CLI-level tests asserting --json includes a structured gate block on gate pauses and omits it for a normal completed run.
src/specify_cli/workflows/engine.py Persists type in each step’s recorded step_results entry so step-type introspection is possible from run state.
src/specify_cli/__init__.py Builds the --json outcome payload and conditionally injects gate details via a helper when the current step is a gate.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 3/3 changed files
  • Comments generated: 2

Comment thread src/specify_cli/__init__.py Outdated
Comment thread tests/test_workflows.py Outdated
@mnriem

mnriem commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Please address Copilot feedback

doquanghuy added a commit to doquanghuy/spec-kit that referenced this pull request Jun 17, 2026
Address review (github#2965): _gate_outcome() emitted a gate block whenever current_step_id pointed at a gate step. Since RunState.current_step_id is never cleared on completion, a completed/failed run whose last step was a gate leaked stale gate detail in run/resume/status --json. Guard on status == paused. Also assert CLI success in the _run_json test helper before JSON-parsing, and add direct coverage for the suppression guard.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@doquanghuy

Copy link
Copy Markdown
Contributor Author

@mnriem Thanks for the review — addressed the Copilot feedback:

  • _gate_outcome() now only surfaces the gate block while the run is actually paused (guards on status == paused). Since RunState.current_step_id isn't cleared on completion, a completed/failed run whose last step was a gate no longer leaks stale gate detail in run/resume/status --json.
  • Hardened the _run_json test helper to assert CLI success before JSON-parsing, and added direct coverage for the suppression guard.

Full suite green; ruff check src/ clean. Ready for another look.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 3/3 changed files
  • Comments generated: 3

Comment thread src/specify_cli/__init__.py Outdated
Comment thread tests/test_workflows.py Outdated
Comment thread tests/test_workflows.py
@doquanghuy

Copy link
Copy Markdown
Contributor Author

@mnriem Pushed 5fd0f85 addressing the latest Copilot round:

  • Abort path now surfaces the gate block. _gate_outcome() emits the gate detail for aborted runs too, not only paused. Abort is the only path that sets ABORTED (gate rejection with on_reject: abort) and it leaves current_step_id on that gate, so an orchestrator can read the recorded choice for the stop. completed/failed stay suppressed.
  • Stable JSON schema. message is coerced to a string — GateStep only coerces it for expression interpolation, so a non-string YAML literal could otherwise leak into the payload.
  • Tests: added a CLI-level aborted-path test (test_gate_abort_carries_gate_block, asserts status == aborted and choice == reject), a message-coercion test, and extended the suppression test to allow aborted. Shared the run helper via _invoke_json to avoid duplicating invoke boilerplate.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 3/3 changed files
  • Comments generated: 1

Comment thread tests/test_workflows.py Outdated
@mnriem

mnriem commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

Please address Copilot feedback and rebase on upstream/main

@doquanghuy

Copy link
Copy Markdown
Contributor Author

@mnriem Pushed 5f60408 for the latest Copilot round:

  • The gate-abort test parsed stdout without first asserting the CLI exited cleanly, so an invoke failure would have surfaced as an opaque JSON decode error. It now routes through _run_json (which asserts exit_code == 0 before parsing), and I dropped the now-redundant _invoke_json helper — a gate abort emits the payload and returns, so the run exits cleanly.

Full tests/test_workflows.py green (212 passed).

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 3/3 changed files
  • Comments generated: 2

Comment thread tests/test_workflows.py Outdated
Comment thread src/specify_cli/workflows/engine.py
@doquanghuy

Copy link
Copy Markdown
Contributor Author

@mnriem Pushed 3e303fb addressing the latest Copilot round:

  • Run-helper assertion message now uses result.output instead of result.stdout. Under --json, step output is redirected off stdout, so a failing run's useful diagnostics live on result.output; the JSON parse still reads stdout. This also brings _run_json in line with the other CLI tests in the file.
  • StepContext.steps docstring updated from the old 5-key entry shape to the canonical 7-key shape the engine actually persists (type, integration, model, options, input, output, status), so step authors and debuggers see the real record.

Full tests/test_workflows.py green (212 passed); ruff clean.

Heads up: GitHub now shows this branch as conflicting with main — that's from #2959 and #2963 having merged (they touch the same run-command / engine.py / test regions). I've left the branch as-is rather than rebase a public PR unprompted; happy to rebase onto current main and resolve if you'd like, just say the word.

doquanghuy and others added 6 commits June 18, 2026 00:55
A paused run was indistinguishable from any other pause in the
machine-readable outcome, and the gate's prompt/options/choice never
left the human-facing stream. Record each step's type in the run
state's step results (one engine line) and, when the run sits at a
gate, add a gate block (step_id/message/options/choice) to the payload
so orchestrators can drive review gates without parsing stdout.

Reference implementation for the proposal in github#2964.

Addresses github#2964
Address review (github#2965): _gate_outcome() emitted a gate block whenever current_step_id pointed at a gate step. Since RunState.current_step_id is never cleared on completion, a completed/failed run whose last step was a gate leaked stale gate detail in run/resume/status --json. Guard on status == paused. Also assert CLI success in the _run_json test helper before JSON-parsing, and add direct coverage for the suppression guard.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Address Copilot review:
- `_gate_outcome` now also surfaces the gate block when a run is `aborted`
  by a gate rejection (`on_reject: abort`), not only when `paused`. Abort
  is the only path that sets ABORTED and it leaves current_step_id on the
  gate, so an orchestrator can read the recorded `choice` for the stop.
- Coerce `message` to a string (it may be a non-string YAML literal that
  GateStep only coerces for interpolation) so the JSON schema stays stable.
- Tests: add a CLI-level aborted-path test, a message-coercion test, and
  extend the suppression test to allow `aborted`; share the run helper via
  `_invoke_json` to avoid duplicating the invoke boilerplate.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Address Copilot review: the gate-abort test parsed stdout without first
asserting the CLI exited cleanly, so an invoke failure would surface as an
opaque JSON decode error. Route it through `_run_json` (which asserts
exit_code == 0 before parsing) and drop the now-redundant `_invoke_json`
helper — a gate abort emits the payload and returns, so the run exits 0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Address Copilot review:
- `_run_json` asserted with `result.stdout` in the message, but under
  `--json` step output is redirected off stdout — the useful diagnostics
  live on `result.output`. Switch the assertion message to `result.output`
  (the JSON parse still reads stdout), matching the other CLI tests.
- `StepContext.steps` documented a 5-key entry shape; the engine now also
  persists `type` and `status`. Update the docstring to the canonical
  7-key shape so step authors/debuggers see the real record.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
After rebasing onto main, a gate abort now emits the --json payload and
then exits non-zero (`_run_outcome_exit_code` maps aborted → 1, from the
merged exit-code work). Give `_run_json` an `expected_exit` parameter
(default 0) so the abort case asserts exit 1 while the paused/completed
cases stay at 0 — keeping a single shared helper rather than duplicating
the invoke boilerplate.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@doquanghuy doquanghuy force-pushed the feat/2964-gate-outcome-json branch from 3e303fb to 24d6e85 Compare June 17, 2026 17:57
@doquanghuy

Copy link
Copy Markdown
Contributor Author

@mnriem Rebased onto current main and resolved the conflict (force-pushed 24d6e85). The branch is now MERGEABLE.

What the rebase touched:

  • Conflict was a clean "keep both": fix: non-zero exit code when a workflow run ends failed or aborted #2959 added TestWorkflowRunExitCodes exactly where this PR adds TestWorkflowRunGateOutcomeJson — both classes are preserved in full. _workflow_run_payload now (correctly) calls _gate_outcome and keeps _run_outcome_exit_code; the step_data["type"] field this PR relies on sits cleanly alongside the rest.
  • One real semantic interaction with fix: non-zero exit code when a workflow run ends failed or aborted #2959: a gate abort now emits the --json payload and then exits non-zero (aborted → 1). I gave the test helper an expected_exit param (default 0) so the abort test asserts exit 1 while paused/completed stay at 0 — single shared helper, no duplicated invoke boilerplate.

All prior Copilot fixes are intact (paused/aborted guard, message coercion, result.output assert message, 7-key StepContext.steps docstring). Full specify_cli test suite green locally (3879 passed); ruff clean.

@mnriem mnriem requested a review from Copilot June 17, 2026 18:51

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 4/4 changed files
  • Comments generated: 1

Comment thread src/specify_cli/__init__.py
Address Copilot review:
- A run paused by an older version has no persisted step `type`, so
  `_gate_outcome` would never surface its gate block on resume. Add
  `_is_gate_step`: prefer the `type` field, but when it is absent fall back
  to the gate's unique output signature (`on_reject`, written only by
  GateStep). A record with a different known `type` is still not a gate.
- Normalize `options` to a list of strings (mirroring the `message`
  coercion) so an unvalidated workflow with non-string options can't
  destabilize the JSON schema.
- Tests: options coercion, type-less gate detection, and a type-less
  non-gate negative case.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@doquanghuy

Copy link
Copy Markdown
Contributor Author

@mnriem Pushed d322cf5 for the latest Copilot comment:

  • Backward-compatible gate detection. A run paused by an older version has no persisted step type, so _gate_outcome would have dropped its gate block on resume. Added _is_gate_step: it prefers the type field, and when type is absent falls back to the gate's unique output signature (on_reject, which only GateStep writes). A record carrying a different known type is still not treated as a gate.
  • Options normalized to strings. Mirroring the existing message coercion, options is now normalized to a list of strings so an unvalidated workflow with non-string options can't destabilize the JSON schema.
  • Tests: options coercion, type-less gate detection (resume path), and a type-less non-gate negative case.

Full tests/test_workflows.py green (291 passed); ruff clean.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 4/4 changed files
  • Comments generated: 1

Comment thread src/specify_cli/__init__.py
Address Copilot review: the prior options normalization only mapped a
`list`, returning the raw value for any other shape (scalar/tuple), which
contradicted the "stable list[str]" intent. Extract `_normalize_gate_options`:
None stays None; list/tuple maps each element through str; any other scalar
becomes a single-element list (a bare string is one option, never iterated
character-by-character). The emitted schema is now always list[str] | None.
Extend the options test to cover list, tuple, bare string, numeric scalar,
and None.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@doquanghuy

Copy link
Copy Markdown
Contributor Author

@mnriem Pushed a97cd18 for the latest Copilot comment:

  • Non-list options now normalized too. The previous pass only mapped a list, returning the raw value for any other shape — so a scalar or tuple leaked through unnormalized, contradicting the list[str] intent. Extracted _normalize_gate_options: None stays None; a list/tuple maps each element through str; any other scalar becomes a single-element list (a bare string is treated as one option, never iterated character-by-character). The emitted schema is now always list[str] | None.
  • Tests extended to cover list, tuple, bare string, numeric scalar, and None.

Full tests/test_workflows.py green (291 passed); ruff clean.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 4/4 changed files
  • Comments generated: 2

Comment on lines +2144 to +2154
output = step.get("output") or {}
# `message` and `options` may be non-string YAML literals in an unvalidated
# workflow (GateStep coerces neither for the payload), so normalise both
# here for a stable JSON schema: message → str, options → list[str] | None.
message = output.get("message")
return {
"step_id": state.current_step_id,
"message": None if message is None else str(message),
"options": _normalize_gate_options(output.get("options")),
"choice": output.get("choice"),
}
Comment thread tests/test_workflows.py Outdated
steps:
- id: fine
type: shell
run: "true"
Address Copilot review:
- `_gate_outcome` normalized `message` and `options` but passed `choice`
  through as-is; an unvalidated gate can record a non-string `choice`,
  which contradicts the stable-schema rationale. Coerce `choice` to
  `str | None` (None still means "no decision yet"), consistent with the
  other two fields. Adds a focused choice-coercion test.
- The plain (no-gate) test workflow used `run: "true"`, which fails under
  cmd.exe on Windows (ShellStep uses shell=True). Use the cross-platform
  `run: "exit 0"` (matching the exit-code suite's workflows).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@doquanghuy

Copy link
Copy Markdown
Contributor Author

@mnriem Pushed c4297a0 for the latest Copilot round:

  • choice now normalized too. _gate_outcome normalized message and options but passed choice through as-is — an unvalidated gate can record a non-string choice, contradicting the stable-schema rationale. It is now coerced to str | None (None still means "no decision yet"), consistent with the other two fields. Added a focused choice-coercion test (None / string / non-string).
  • Portable plain-gate test. The no-gate test workflow used run: "true", which fails under cmd.exe on Windows (ShellStep uses shell=True). Switched to the cross-platform run: "exit 0", matching the exit-code suite's workflows. (Scoped to the one occurrence this PR introduced; two pre-existing run: "true" lines elsewhere are upstream and left untouched to keep the diff tight.)

Full tests/test_workflows.py green (292 passed); ruff clean.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants